Unsupervised Syntax-Based Machine Translation: The Contribution of Discontiguous Phrases
نویسنده
چکیده
We present a new unsupervised syntax-based MT system, termed U-DOT, which uses the unsupervised U-DOP model for learning paired trees, and which computes the most probable target sentence from the relative frequencies of paired subtrees. We test U-DOT on the German-English Europarl corpus, showing that it outperforms the state-of-the-art phrase-based Pharaoh system. We demonstrate that the inclusion of noncontiguous phrases significantly improves the translation accuracy. This paper presents the first translation results with the data-oriented translation (DOT) model on the Europarl corpus, to the best of our knowledge. Introduction: Phrase-Based vs Syntax-Based
منابع مشابه
Deep Syntax Language Models and Statistical Machine Translation
Hierarchical Models increase the reordering capabilities of MT systems by introducing non-terminal symbols to phrases that map source language (SL) words/phrases to the correct position in the target language (TL) translation. Building translations via discontiguous TL phrases increases the difficulty of language modeling, however, introducing the need for heuristic techniques such as cube prun...
متن کاملPhrase Dependency Machine Translation with Quasi-Synchronous Tree-to-Tree Features
Recent research has shown clear improvement in translation quality by exploiting linguistic syntax for either the source or target language. However, when using syntax for both languages (“tree-to-tree” translation), there is evidence that syntactic divergence can hamper the extraction of useful rules (Ding and Palmer 2005). Smith and Eisner (2006) introduced quasi-synchronous grammar, a formal...
متن کاملQuasi-Synchronous Phrase Dependency Grammars for Machine Translation
We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text u...
متن کاملمدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملString-to-Tree Multi Bottom-up Tree Transducers
We achieve significant improvements in several syntax-based machine translation experiments using a string-to-tree variant of multi bottom-up tree transducers. Our new parameterized rule extraction algorithm extracts string-to-tree rules that can be discontiguous and non-minimal in contrast to existing algorithms for the tree-to-tree setting. The obtained models significantly outperform the str...
متن کامل